skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Hu, Yang"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available December 1, 2026
  2. Free, publicly-accessible full text available July 4, 2026
  3. Ozay, Necmiye; Balzano, Laura; Panagou, Dimitra; Abate, Alessandro (Ed.)
    The pursuit of robustness has recently been a popular topic in reinforcement learning (RL) research, yet the existing methods generally suffer from computation issues that obstruct their real-world implementation. In this paper, we consider MDPs with low-rank structures, where the transition kernel can be written as a linear product of feature map and factors. We introduce *duple perturbation* robustness, i.e. perturbation on both the feature map and the factors, via a novel characterization of (𝜉,đťś‚) -ambiguity sets featuring computational efficiency. Our novel low-rank robust MDP formulation is compatible with the low-rank function representation view, and therefore, is naturally applicable to practical RL problems with large or even continuous state-action spaces. Meanwhile, it also gives rise to a provably efficient and practical algorithm with theoretical convergence rate guarantee. Lastly, the robustness of our proposed approach is justified by numerical experiments, including classical control tasks with continuous state-action spaces. 
    more » « less
    Free, publicly-accessible full text available June 4, 2026
  4. Free, publicly-accessible full text available April 25, 2026
  5. Li, Yingzhen; Mandt, Stephan; Agrawal, Shipra; Khan, Emtiyaz (Ed.)
    Off-policy evaluation (OPE) is one of the most fundamental problems in reinforcement learning (RL) to estimate the expected long-term payoff of a given target policy with \emph{only} experiences from another behavior policy that is potentially unknown. The distribution correction estimation (DICE) family of estimators have advanced the state of the art in OPE by breaking the \emph{curse of horizon}. However, the major bottleneck of applying DICE estimators lies in the difficulty of solving the saddle-point optimization involved, especially with neural network implementations. In this paper, we tackle this challenge by establishing a \emph{linear representation} of value function and stationary distribution correction ratio, \emph{i.e.}, primal and dual variables in the DICE framework, using the spectral decomposition of the transition operator. Such primal-dual representation not only bypasses the non-convex non-concave optimization in vanilla DICE, therefore enabling an computational efficient algorithm, but also paves the way for more efficient utilization of historical data. We highlight that our algorithm, \textbf{SpectralDICE}, is the first to leverage the linear representation of primal-dual variables that is both computation and sample efficient, the performance of which is supported by a rigorous theoretical sample complexity guarantee and a thorough empirical evaluation on various benchmarks. 
    more » « less
    Free, publicly-accessible full text available May 3, 2026
  6. Free, publicly-accessible full text available October 1, 2026
  7. Free, publicly-accessible full text available October 16, 2026